Skip to content

Conversation

@NaluTripician
Copy link
Contributor

@NaluTripician NaluTripician commented Oct 13, 2025

Pull Request Template

Description

This pull request introduces a new semantic reranking feature to the Azure Cosmos DB .NET SDK, enabling users to rerank documents using an inference service that leverages Azure Active Directory (AAD) authentication. The main changes include the addition of the InferenceService class, new API surface for semantic reranking, and appropriate integration into the SDK's authorization and client context infrastructure. Notably, this functionality is only available when using AAD authentication.

Semantic Reranking Feature Integration:

  • Added the InferenceService class, which handles communication with the Cosmos DB Inference Service for semantic reranking, including HTTP client configuration, payload construction, and response handling. This service enforces AAD authentication and manages its own authorization and disposal.
  • Introduced a new public (under PREVIEW) or internal API SemanticRerankAsync to the Container class, allowing users to rerank a list of documents based on a context/query string. This is implemented in ContainerInlineCore and routed through the client context. [1] [2]

Authorization and Token Handling Updates:

  • Extended the AuthorizationTokenProvider abstraction and its implementations to support a new method, AddInferenceAuthorizationHeaderAsync, which is only valid for AAD-based token providers. Non-AAD providers throw a NotImplementedException for this method. [1] [2] [3] [4] [5] [6]

Client Context and Resource Management:

  • Updated ClientContextCore and CosmosClientContext to manage the lifecycle of the InferenceService, including creation, caching, and disposal. Added methods for invoking semantic reranking and for retrieving or creating the inference service instance. [1] [2] [3] [4] [5] [6]

Dependency Updates:

  • Added a dependency on the Azure.Identity package in the test project to support AAD authentication scenarios.
    Please delete options that are not relevant.
  • [] New feature (non-breaking change which adds functionality)

Closing issues

To automatically close an issue: closes #IssueNumber

@NaluTripician NaluTripician marked this pull request as draft October 13, 2025 17:52
@NaluTripician NaluTripician marked this pull request as ready for review October 22, 2025 22:37
milismsft
milismsft previously approved these changes Oct 22, 2025
Copy link

@milismsft milismsft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try to address the potential multiple background tasks related to the Interference object (and proper dispose of that task as well) :-)

Copy link
Member

@aayush3011 aayush3011 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NaluTripician LGTM, added the comments, that we discussed offline.

aayush3011
aayush3011 previously approved these changes Nov 7, 2025
Copy link
Member

@aayush3011 aayush3011 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks Nalu

@NaluTripician
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

AuthorizationTokenType tokenType,
ITrace trace);

public abstract ValueTask AddInferenceAuthorizationHeaderAsync(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO let's not overload the core types with inference specific?

string containerLinkUri,
CancellationToken cancellationToken);

internal abstract Task<SemanticRerankResult> SemanticRerankAsync(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it needed part of contract?
One option is to just inline implementation inside ContainerInlineCore.cs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed as ContainerInlineCore calls the generic CosmosClientContext class rather than it's implementation.


// Create and configure HttpClient for inference requests.
HttpMessageHandler httpMessageHandler = CosmosHttpClientCore.CreateHttpClientHandler(
gatewayModeMaxConnectionLimit: client.DocumentClient.ConnectionPolicy.MaxConnectionLimit,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's please isolate Inference settings/configurations.

RuntimeConstants.MediaTypes.Json);

// Send the request and ensure success.
HttpResponseMessage responseMessage = await this.httpClient.SendAsync(message, cancellationToken);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reliability story? (ex: retries etc...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retries will come at a later date

// Parse the rerank scores, latency, and token usage from the response.
return new SemanticRerankResult(
ParseRerankScores(responseJson["Scores"]),
responseJson.ContainsKey("latency") ? Newtonsoft.Json.JsonConvert.DeserializeObject<Dictionary<string, object>>(responseJson["latency"].ToString()) : null,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deserialize to a shadow or internal type instead?

return new SemanticRerankResult(
ParseRerankScores(responseJson["Scores"]),
responseJson.ContainsKey("latency") ? Newtonsoft.Json.JsonConvert.DeserializeObject<Dictionary<string, object>>(responseJson["latency"].ToString()) : null,
responseJson.ContainsKey("token_usage") ? Newtonsoft.Json.JsonConvert.DeserializeObject<Dictionary<string, object>>(responseJson["token_usage"].ToString()) : null,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any new top level values will be missed right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants